Statistics for Political Science
August 7, 2025
R Patrick Buhr Fifth-year Ph.D. Candidate studying U.S. Congress and presidency.
Mason Auten Third-year Ph.D. Student studying historical state formation.
You are interested in asking and answering questions about politics.
Quantitative methods can answer each of these questions.
Foundation of quantitative analysis:
Description: Summarizing and understanding the characteristics of the data we have.
Inference: Making generalizations about a population based on sample data.
Prediction: Forecasting future events or behaviors based on existing data patterns.
For quantitative analysis, we need to operationalize a concept into a numerical representation.
Occasionally a variable will lend itself well to quantification: income, voter turnout, number of bills introduced by a Member of Congress, hours of cable television a person watches.
Other times, quantification is easy but there is dispute over which is the best measure: GDP, GDP per Capita, Human Development Index
“Likert” scale, 7-point:
1. Very Liberal
2. Liberal
3. Somewhat Liberal
4. Moderate/Middle of the Road
5. Somewhat Conservative
6. Conservative
7. Very Conservative
Measures both direction (Liberal vs. Conservative) as well as intensity (Somewhat vs. Very).
Can we use the same Likert scale?
On a scale of 1 to 5, with 1 being “strongly disagree” and 5 being “strong agree” how much do you agree with the following statements:
Scores are averaged to create a composite racial resentment index.
Every quantitative paper in political science answers some variation of the following question:
How (or why) does \(x\) affect \(y\)?
Every causal statement has the counterfactual of “if \(x\) had been different, then \(y\) would have been different too.”
We study units.
Units have attributes.
Variables are logical groupings of mutually exclusive attributes.
Dataframes are a structured way to organize variables.
In this course, we will only use dataframes. The major alternative to dataframes is lists, which are used primarily in engineering and computer science.
Definition: Nominal variables classify data into distinct categories without any inherent order or ranking.
Examples: - Political Party Affiliation: Democrat, Republican, Independent, Other
- Country of Origin: United States, Canada, Mexico
- Race: Black, White, AAPI, Latino/a
Definition: Ordinal variables have categories that can be ranked in a meaningful order, but the differences between categories are not necessarily equal.
Examples:
Definition: Continuous (sometimes called numeric) variables have ordered categories with equal intervals between values.
Examples:
Definition: Binary (sometimes called Boolean) variables are either 1 or 0 (or TRUE/FALSE)
Examples:
What type of variable is the following (categorical, ordinal, continuous, binary):
Religious affiliation (e.g. Catholic, Protestant, Muslim, Jewish, None).
Categorical, because categories are not ranked.
Unemployment rate in a Member of Congress’s district (percentage of constituents who are unemployed)
Continuous, because distance between numbers is consistent.
Political interest (e.g. not interested, slightly interested, fairly interested, very interested)
Ordinal, because categories have a ranking
Often, we can represent concepts in different ways.
What would be the best way to operationalize education?
R is a powerful and widely used programming language for statistical analysis.
It is free, open-source, and widely supported by researchers and data analysts.
R provides extensive libraries for data visualization, regression modeling, and machine learning.
Learning R enhances your ability to conduct independent research and analyze real-world data.
Principle One: “Write code for humans.”
#.Principle Two: “Let the computer do the work”
Types of problems:
Solving problems: